Figure 23-2a shows a typical survival curve. It’s not defined by any algebraic formula. It just
graphs the table of values obtained by a life-table or Kaplan-Meier calculation.
Figure 23-2b shows how the baseline survival curve is flexed by raising every baseline survival
value to a power. You get the lower curve by setting h = 2 and squaring every baseline survival
value. You get the upper curve by setting h = 0.05 and taking the square root of every baseline
survival value. Notice that the two flexed curves keep all the distinctive zigs and zags of the
baseline curve, in that every step occurs at the same time value as it occurs in the baseline curve.
The lower curve represents a group of participants who had a worse survival outcome than
those making up the baseline group. This means that at any instant in time, they were
somewhat more likely to die than a baseline participant at that same moment. Another way
of saying this is that the participants in the lower curve have a higher hazard rate than the
baseline participants.
The upper curve represents participants who had better survival than a baseline person at
any given moment — meaning they had a lower hazard rate.
Obviously, there is a mathematical relationship between the chance of dying at any instant in time,
which is called hazard, and the chance of surviving up to some point in time, which we call survival.
It turns out that raising the survival curve to the h power is exactly equivalent to multiplying the hazard
curve by the natural logarithm of h. Because every point in the hazard curve is being multiplied by the
same amount — by Log(h) — raising a survival curve to a power is referred to as a proportional
hazards transformation.
But what should the value of h be? The h value varies from one individual to another. Keep in
mind that the baseline curve describes the survival of a perfectly average participant, but no
individual is completely average. You can think of every participant in the data as having her very
own personalized survival curve, based on her very own h value, that provides the best estimate
of that participant’s chance of survival over time.
Seeing how predictor variables influence h
The final piece of the PH regression puzzle is to figure out how the predictor variables influence h,
which influences survival. As you likely know, all regression procedures estimate the values of the
coefficients that make the predicted values agree as much as possible with the observed values. For
PH regression, the software estimates the coefficients of the predictor variables that make the
predicted survival curves agree as much as possible with the observed survival times of each
participant.
How does PH regression determine these regression coefficients? The short answer is,
“You’ll be sorry you asked!” The longer answer is that, like all other kinds of regression, PH
regression is based on maximum likelihood estimation. The software uses the data to build a long,
complicated expression for the probability of one particular individual in the data dying at any